Document Indexing Using Independent Topic Extraction

نویسندگان

  • Yu-Hwan Kim
  • Byoung-Tak Zhang
چکیده

Text retrieval involves finding relevant information from a collection of documents given the user’s information need. Traditional information retrieval systems represent a document as a vector of words, where each component can be a simple word count or follows a more sophisticated weighting scheme. The composed matrix is usually sparse and some words are highly correlated with each other. Another representation scheme is focused on the underlying topics which is usually obtained by a variety of dimension reduction techniques, where each document can be represented as a vector of topic intensity. In this paper, We proposed a novel indexing technique based on independent component analysis. From the experiments performed on AP news articles, the performance improvements is significant when the topic of query is closely related to the topics which is extracted from the ICA.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation

Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...

متن کامل

Identifying Features from Opinion Mining Using Fine-Grained Relational Topic Weighted Approach

-Opinion feature extraction is a sub problem of opinion mining analyzed at document, sentence, or even phrase (word) levels. Document-level (sentence-level) opinion mining is classified as overall subjectivity or sentiment, expressed in an individual review document. The existing approaches to opinion feature extraction depended on mining patterns from a particular evaluate corpus disregard non...

متن کامل

Document Summarization Retrieval System Based on Web User Needs

Existing models for document summarization mostly use the similarity between sentences in the document to extract the most salient sentences. The documents as well as the sentences are indexed using traditional term indexing measures, which do not take the context into consideration. Therefore, the sentence similarity values remain independent of the context. In this paper, we propose a context...

متن کامل

Mining Concepts from Texts

The extraction of multi-word relevant expressions has been an increasingly hot topic in the last few years. Relevant expressions are applicable in diverse areas such as Information Retrieval, document clustering, or classification and indexing of documents. However, relevant single-words, which represent much of the knowledge in texts, have been a relatively dormant field. In this paper we pres...

متن کامل

Correlation Preserved Indexing Based Approach For Document Clustering

Document clustering is the act of collecting similar documents into clusters, where similarity is some function on a document. Document clustering method achieves 1) a high accuracy for documents 2) document frequency can be calculated 3) term weight is calculated with the term frequency vector. Document clustering is closely related to the concept of data clustering. Document clustering is a m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001